reGenotyper: Detecting mislabeled samples in genetic data

نویسندگان

  • Konrad Zych
  • Basten L Snoek
  • Mark Elvin
  • Miriam Rodriguez
  • K Joeri Van der Velde
  • Danny Arends
  • Harm-Jan Westra
  • Morris A Swertz
  • Gino Poulin
  • Jan E Kammenga
  • Rainer Breitling
  • Ritsert C Jansen
  • Yang Li
چکیده

In high-throughput molecular profiling studies, genotype labels can be wrongly assigned at various experimental steps; the resulting mislabeled samples seriously reduce the power to detect the genetic basis of phenotypic variation. We have developed an approach to detect potential mislabeling, recover the "ideal" genotype and identify "best-matched" labels for mislabeled samples. On average, we identified 4% of samples as mislabeled in eight published datasets, highlighting the necessity of applying a "data cleaning" step before standard data analysis.

منابع مشابه

An Algorithm for Recognizing Mislabeled and Abnormal Samples in Cancer Microarray

Microarray is a high-throughput experimental technology which has been used in many life-science areas especially in medical applications. The sample classification problem is crucial for disease diagnosis and treatment. However, the process of sample labeling can be very complex and partially subjective. Existing studies confirm this phenomenon and show that even a very small number of error s...

متن کامل

Detecting potential labeling errors in microarrays by data perturbation

MOTIVATION Classification is widely used in medical applications. However, the quality of the classifier depends critically on the accurate labeling of the training data. But for many medical applications, labeling a sample or grading a biopsy can be subjective. Existing studies confirm this phenomenon and show that even a very small number of mislabeled samples could deeply degrade the perform...

متن کامل

Finding Originally Mislabels with MD-ELM

This paper presents a methodology which aims at detecting mislabeled samples, with a practical example in the field of bankruptcy prediction. Mislabeled samples are found in many classification problems and can bias the training of the desired classifier. This paper proposes a new method based on Extreme Learning Machine (ELM) which allows for identification of the most probable mislabeled samp...

متن کامل

First Report of a set of Genetic Identities in Prunus Rootstocks by SSR Markers

Prunus rootstocks play an important role in modern horticulture and commercial orchards owing to their responsibility for a wide range of characters from compatibility with cultivars to adaptation to biotic and abiotic stresses. In this study, Thirty Prunus rootstock samples were tested by 25 microsatellite markers in order to identify the genetic identity and relationships among them.17 SSR ma...

متن کامل

Invention and validation of an automated camera system that uses optical character recognition to identify patient name mislabeled samples.

BACKGROUND Mislabeled samples are a serious problem in most clinical laboratories. Published error rates range from 0.39/1000 to as high as 1.12%. Standardization of bar codes and label formats has not yet achieved the needed improvement. The mislabel rate in our laboratory, although low compared with published rates, prompted us to seek a solution to achieve zero errors. METHODS To reduce or...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

متن کامل
عنوان ژورنال:

دوره 12  شماره 

صفحات  -

تاریخ انتشار 2017